The raw dataset, restaurants.csv, starts with
10018 observations across 34 attributes,
encompassing a broad range of restaurant information. In its initial
state, the data presents several challenges requiring immediate
preparation: critical analytical variables such as rating
and userRatingCount contain missing values, necessitating
filtering or imputation to maintain data integrity for quality
assessment.
Furthermore, many of the 34 columns, particularly the numerous
service and amenity indicators (e.g., delivery,
servesCocktails), are stored inefficiently as generic
object (string) types instead of proper Booleans or
numerics, alongside price information which also requires
type conversion. Additionally, columns like id,
name, formattedAddress, and geographical
coordinates, while useful for identification, must be removed or handled
separately before conducting core statistical analysis and exploratory
data visualization.
rating: The primary target variable for
prediction/analysis.
userRatingCount: Critical feature indicating
popularity/credibility.
primaryType: The category of restaurant (e.g.,
Mexican, Thai). Essential for segmentation.
businessStatus: Crucial to filter out
closed/temporarily closed restaurants if analyzing operational
performance.
priceStartUSD, priceEndUSD: Important
for price segmentation, but they require cleaning.
Service & Amenity Booleans:
(takeout, delivery, servesDinner,
servesWine, servesCocktails,
wheelchairAccessible… etc.): These may be valuable for
feature engineering and market segmentation.
# Load the dataset CSV
restuarant_data = read_csv("data/restaurants.csv")
## Rows: 10018 Columns: 34
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): id, name, formattedAddress, phone, businessStatus, primaryType, go...
## dbl (6): rating, userRatingCount, latitude, longitude, priceStartUSD, price...
## lgl (21): takeout, delivery, dineIn, curbsidePickup, reservable, servesLunch...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
cat("---Number of Obervations and Attributes---\n")
## ---Number of Obervations and Attributes---
cat("Number of Observations (# of rows): " , nrow(restuarant_data), "\n")
## Number of Observations (# of rows): 10018
cat("Number of Attributes (# of columns): " , ncol(restuarant_data), "\n")
## Number of Attributes (# of columns): 34
# The 'glimpse' function provides a transposed view of the data, which is great for viewing types
glimpse(restuarant_data)
## Rows: 10,018
## Columns: 34
## $ id <chr> "ChIJ--c8h4jRD4gRRY6i7bZEpZU", "ChIJ--eTT…
## $ name <chr> "Lady Gregory's Irish Bar & Restaurant", …
## $ rating <dbl> 4.5, 4.5, 4.5, 2.6, 4.0, NA, 4.2, NA, 4.7…
## $ userRatingCount <dbl> 2822, 572, 1132, 61, 835, NA, 339, NA, 13…
## $ formattedAddress <chr> "5260 N Clark St, Chicago, IL 60640, USA"…
## $ latitude <dbl> 41.97789, 41.79242, 41.81210, 41.87463, 4…
## $ longitude <dbl> -87.66856, -87.78884, -87.70782, -87.6686…
## $ phone <chr> "(773) 271-5050", "(773) 586-2828", "(773…
## $ businessStatus <chr> "OPERATIONAL", "OPERATIONAL", "OPERATIONA…
## $ primaryType <chr> "restaurant", "pizza_restaurant", "seafoo…
## $ takeout <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ delivery <lgl> TRUE, TRUE, FALSE, NA, TRUE, NA, TRUE, NA…
## $ dineIn <lgl> TRUE, NA, TRUE, TRUE, NA, TRUE, TRUE, TRU…
## $ curbsidePickup <lgl> NA, NA, FALSE, FALSE, NA, NA, TRUE, NA, F…
## $ reservable <lgl> TRUE, FALSE, TRUE, FALSE, FALSE, NA, NA, …
## $ servesLunch <lgl> TRUE, NA, TRUE, TRUE, TRUE, NA, TRUE, TRU…
## $ servesDinner <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ servesBeer <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, NA, FAL…
## $ servesWine <lgl> TRUE, FALSE, FALSE, FALSE, FALSE, NA, FAL…
## $ liveMusic <lgl> FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, …
## $ servesCocktails <lgl> TRUE, FALSE, TRUE, FALSE, FALSE, NA, NA, …
## $ goodForChildren <lgl> TRUE, NA, TRUE, TRUE, NA, NA, TRUE, NA, N…
## $ acceptsCreditCards <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ acceptsDebitCards <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ acceptsCashOnly <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, NA, FA…
## $ acceptsNfc <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, NA, TRUE, T…
## $ freeParkingLot <lgl> NA, NA, TRUE, NA, TRUE, NA, TRUE, NA, NA,…
## $ freeStreetParking <lgl> TRUE, TRUE, TRUE, NA, TRUE, NA, TRUE, NA,…
## $ wheelchairAccessibleEntrance <lgl> TRUE, NA, TRUE, TRUE, TRUE, NA, TRUE, TRU…
## $ wheelchairAccessibleRestroom <lgl> TRUE, NA, TRUE, TRUE, NA, NA, TRUE, NA, T…
## $ wheelchairAccessibleSeating <lgl> TRUE, FALSE, TRUE, TRUE, NA, NA, NA, NA, …
## $ priceStartUSD <dbl> 20, 10, NA, 10, 10, NA, NA, NA, 10, 20, 1…
## $ priceEndUSD <dbl> 30, 20, NA, 20, 20, NA, NA, NA, 20, 30, 1…
## $ googleMapsUri <chr> "https://maps.google.com/?cid=10783100435…
# Summary provides min, max, median, mean, and quartiles for numeric columns
summary(restuarant_data)
## id name rating userRatingCount
## Length:10018 Length:10018 Min. :1.000 Min. : 1.0
## Class :character Class :character 1st Qu.:3.900 1st Qu.: 101.0
## Mode :character Mode :character Median :4.300 Median : 323.0
## Mean :4.171 Mean : 641.3
## 3rd Qu.:4.600 3rd Qu.: 760.0
## Max. :5.000 Max. :23596.0
## NA's :540 NA's :540
## formattedAddress latitude longitude phone
## Length:10018 Min. :41.64 Min. :-87.94 Length:10018
## Class :character 1st Qu.:41.81 1st Qu.:-87.77 Class :character
## Mode :character Median :41.89 Median :-87.70 Mode :character
## Mean :41.87 Mean :-87.72
## 3rd Qu.:41.94 3rd Qu.:-87.65
## Max. :42.02 Max. :-87.52
##
## businessStatus primaryType takeout delivery
## Length:10018 Length:10018 Mode :logical Mode :logical
## Class :character Class :character FALSE:119 FALSE:1252
## Mode :character Mode :character TRUE :9093 TRUE :7257
## NA's :806 NA's :1509
##
##
##
## dineIn curbsidePickup reservable servesLunch
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:300 FALSE:2220 FALSE:3191 FALSE:193
## TRUE :8279 TRUE :2343 TRUE :2959 TRUE :7912
## NA's :1439 NA's :5455 NA's :3868 NA's :1913
##
##
##
## servesDinner servesBeer servesWine liveMusic
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:316 FALSE:4071 FALSE:4199 FALSE:6618
## TRUE :7561 TRUE :2878 TRUE :2524 TRUE :560
## NA's :2141 NA's :3069 NA's :3295 NA's :2840
##
##
##
## servesCocktails goodForChildren acceptsCreditCards acceptsDebitCards
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:3833 FALSE:752 FALSE:51 FALSE:87
## TRUE :2522 TRUE :6475 TRUE :8347 TRUE :8658
## NA's :3663 NA's :2791 NA's :1620 NA's :1273
##
##
##
## acceptsCashOnly acceptsNfc freeParkingLot freeStreetParking
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:8983 FALSE:212 FALSE:217 FALSE:99
## TRUE :116 TRUE :7271 TRUE :4371 TRUE :5004
## NA's :919 NA's :2535 NA's :5430 NA's :4915
##
##
##
## wheelchairAccessibleEntrance wheelchairAccessibleRestroom
## Mode :logical Mode :logical
## FALSE:245 FALSE:193
## TRUE :6816 TRUE :5141
## NA's :2957 NA's :4684
##
##
##
## wheelchairAccessibleSeating priceStartUSD priceEndUSD
## Mode :logical Min. : 1.00 Min. : 10.00
## FALSE:375 1st Qu.: 10.00 1st Qu.: 20.00
## TRUE :5114 Median : 10.00 Median : 20.00
## NA's :4529 Mean : 13.45 Mean : 23.87
## 3rd Qu.: 10.00 3rd Qu.: 20.00
## Max. :100.00 Max. :100.00
## NA's :2550 NA's :2638
## googleMapsUri
## Length:10018
## Class :character
## Mode :character
##
##
##
##
# View all column names
colnames(restuarant_data)
## [1] "id" "name"
## [3] "rating" "userRatingCount"
## [5] "formattedAddress" "latitude"
## [7] "longitude" "phone"
## [9] "businessStatus" "primaryType"
## [11] "takeout" "delivery"
## [13] "dineIn" "curbsidePickup"
## [15] "reservable" "servesLunch"
## [17] "servesDinner" "servesBeer"
## [19] "servesWine" "liveMusic"
## [21] "servesCocktails" "goodForChildren"
## [23] "acceptsCreditCards" "acceptsDebitCards"
## [25] "acceptsCashOnly" "acceptsNfc"
## [27] "freeParkingLot" "freeStreetParking"
## [29] "wheelchairAccessibleEntrance" "wheelchairAccessibleRestroom"
## [31] "wheelchairAccessibleSeating" "priceStartUSD"
## [33] "priceEndUSD" "googleMapsUri"
# View data in highly formatted output
datatable(restuarant_data)
## Warning in instance$preRenderHook(instance): It seems your data is too big for
## client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html
id, googleMapsUri: Unique identifier
(non-analytical values).formattedAddress, phone: Not useful for
statistical modeling without extensive NLP or geo-processingname: Too high cardinality/unique.lattitude, longitude: High dimensional,
rarely used in simple models).# drop attributes
# tidyverse uses `select` with negative sign (-)
clean_data = restuarant_data %>%
select(-id, -googleMapsUri, -formattedAddress, -phone, -name, -latitude, -longitude)
Will focus on:
rating, userRatingCount drop rows missing
valuespriceStartUSD, priceEndUSD drop rows with
missing values and ensure values are numeric# Handle missing values in critical columns
clean_data = clean_data %>%
filter(!is.na(rating) & !is.na(userRatingCount))
# Clean boolean columns
bool_cols = c("takeout", "delivery", "dineIn", "curbsidePickup", "reservable",
"servesLunch", "servesDinner", "servesBeer", "servesWine", "liveMusic",
"servesCocktails", "goodForChildren", "acceptsCreditCards",
"acceptsDebitCards", "acceptsCashOnly", "acceptsNfc", "freeParkingLot",
"freeStreetParking", "wheelchairAccessibleEntrance",
"wheelchairAccessibleRestroom", "wheelchairAccessibleSeating")
# Ensure columns are logical (TRUE/FALSE) and impute missing as FALSE
clean_data = clean_data %>%
mutate(across(all_of(bool_cols),
~ifelse(is.na(.) | . == "False", FALSE, TRUE)))
# Clean price columns
clean_data = clean_data %>%
mutate(priceStartUSD = as.numeric(priceStartUSD),
priceEndUSD = as.numeric(priceEndUSD)) %>%
filter(!is.na(priceStartUSD) & !is.na(priceEndUSD))
cat("\n--- Final Cleaned Data Dimensions (R) ---\n")
##
## --- Final Cleaned Data Dimensions (R) ---
cat("Observations (Rows):", nrow(clean_data), "\n")
## Observations (Rows): 7378
cat("Attributes (Columns):", ncol(clean_data), "\n")
## Attributes (Columns): 27
cat("\n--- Missing Values Check After Final Clean (R) ---\n")
##
## --- Missing Values Check After Final Clean (R) ---
sapply(clean_data, function(x) sum(is.na(x)))
## rating userRatingCount
## 0 0
## businessStatus primaryType
## 0 0
## takeout delivery
## 0 0
## dineIn curbsidePickup
## 0 0
## reservable servesLunch
## 0 0
## servesDinner servesBeer
## 0 0
## servesWine liveMusic
## 0 0
## servesCocktails goodForChildren
## 0 0
## acceptsCreditCards acceptsDebitCards
## 0 0
## acceptsCashOnly acceptsNfc
## 0 0
## freeParkingLot freeStreetParking
## 0 0
## wheelchairAccessibleEntrance wheelchairAccessibleRestroom
## 0 0
## wheelchairAccessibleSeating priceStartUSD
## 0 0
## priceEndUSD
## 0
# Save Cleaned Data
output_file = "data/restaurant_cleaned.csv"
write_csv(clean_data, output_file)
# Display structure and summary of cleaned data
summary(clean_data)
## rating userRatingCount businessStatus primaryType
## Min. :1.200 Min. : 2.0 Length:7378 Length:7378
## 1st Qu.:3.900 1st Qu.: 159.2 Class :character Class :character
## Median :4.300 Median : 384.0 Mode :character Mode :character
## Mean :4.195 Mean : 681.8
## 3rd Qu.:4.500 3rd Qu.: 808.0
## Max. :5.000 Max. :23596.0
## takeout delivery dineIn curbsidePickup
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:141 FALSE:541 FALSE:500 FALSE:3709
## TRUE :7237 TRUE :6837 TRUE :6878 TRUE :3669
##
##
##
## reservable servesLunch servesDinner servesBeer
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:2099 FALSE:643 FALSE:838 FALSE:1427
## TRUE :5279 TRUE :6735 TRUE :6540 TRUE :5951
##
##
##
## servesWine liveMusic servesCocktails goodForChildren
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:1592 FALSE:1421 FALSE:1941 FALSE:1309
## TRUE :5786 TRUE :5957 TRUE :5437 TRUE :6069
##
##
##
## acceptsCreditCards acceptsDebitCards acceptsCashOnly acceptsNfc
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:703 FALSE:453 FALSE:235 FALSE:1254
## TRUE :6675 TRUE :6925 TRUE :7143 TRUE :6124
##
##
##
## freeParkingLot freeStreetParking wheelchairAccessibleEntrance
## Mode :logical Mode :logical Mode :logical
## FALSE:3558 FALSE:3138 FALSE:1749
## TRUE :3820 TRUE :4240 TRUE :5629
##
##
##
## wheelchairAccessibleRestroom wheelchairAccessibleSeating priceStartUSD
## Mode :logical Mode :logical Min. : 1.00
## FALSE:3040 FALSE:2697 1st Qu.:10.00
## TRUE :4338 TRUE :4681 Median :10.00
## Mean :12.42
## 3rd Qu.:10.00
## Max. :50.00
## priceEndUSD
## Min. : 10.00
## 1st Qu.: 20.00
## Median : 20.00
## Mean : 23.87
## 3rd Qu.: 20.00
## Max. :100.00
top_categories = clean_data %>%
count(primaryType, sort = TRUE) %>%
top_n(15)
## Selecting by n
# Create the bar chart
ggplot(top_categories, aes(x = reorder(primaryType, n), y = n)) +
geom_col(fill = "lightblue") +
coord_flip() + # Flips the axes to make labels easier to read
labs(
title = "Top 15 Most Common Restaurant Categories",
x = "Restaurant Category",
y = "Number of Restaurants"
) +
theme_minimal() # Number of restaurants in the top 15 categories
###Bar chart for Business Status
ggplot(clean_data, aes(x = reorder(businessStatus, -table(businessStatus)[businessStatus]))) +
geom_bar(fill = "skyblue", color = "black") +
geom_text(stat = 'count', aes(label = after_stat(count)), vjust = -0.3) +
labs(
title = "Distribution of Restaurant Business Status",
x = "Business Status",
y = "Number of Restaurants"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Create Overlay Histogram for Rating by Reservability
ggplot(clean_data, aes(x = rating, fill = factor(reservable))) +
geom_histogram(aes(y = after_stat(count / sum(count)) * 100),
position = "identity", alpha = 0.6, bins = 20, color = "black") +
scale_fill_manual(values = c("steelblue", "salmon", "lightgreen"), name = "Reservable") +
labs(
title = "Percentage Distribution of Ratings by Reservability",
x = "Rating",
y = "Percentage of Restaurants"
) +
theme_minimal()
# Using geom_density is often a better choice than an overlayed histogram for comparing distributions because it
# provides a smoother representation and avoids issues with binning and overlapping bars, which can be hard to interpret.
mean_ratings = clean_data %>%
group_by(reservable) %>%
summarise(mean_rating = mean(rating))
# Create an overlayed density plot for Rating by Reservability
ggplot(clean_data, aes(x = rating, fill = reservable)) +
geom_density(alpha = 0.6, color = "black") +
geom_vline(data = mean_ratings, aes(xintercept = mean_rating, color = reservable),
linetype = "dashed", linewidth = 1, show.legend = FALSE) +
scale_fill_manual(
name = "Reservable",
values = c("FALSE" = "salmon", "TRUE" = "steelblue"),
labels = c("No", "Yes")
) +
scale_color_manual(
values = c("FALSE" = "darkred", "TRUE" = "darkblue")
) +
labs(
title = "Distribution of Ratings by Reservability",
x = "Rating",
y = "Density"
) +
theme_minimal()
Histogram helps to visualize the central tendancy and spread of restaurant ratings
ggplot(clean_data, aes(x=rating)) +
geom_histogram(binwidth = 0.2, fill='skyblue', color='black', alpha=0.8) +
# Add density line (equivalent to KDE)
geom_density(color = "blue", linewidth = 1) +
# Add vertical line for the mean
geom_vline(aes(xintercept = mean(rating)),
color = "red", linetype = "dashed", linewidth = 1,
show.legend = TRUE) +
labs(
title = 'Distribution of Restuarant Rating In Chicago',
x = 'Rating',
y = 'Frequency'
)
# Histogram of ratings in Chicago
#hist(clean_data$rating, main="Rating of Restaurants in Chicago", xlab="Ratings", col="lightblue")
# Histogram of rating counts in Chicago
hist(clean_data$userRatingCount, main="Rating Count of Restaurants in Chicago", xlab="Rating Count", breaks=100, col="lightblue")
The histogram reveals that majority of restaurants have rating clustered between 4.0 and 4.6, suggesting a high concentration of well-regarded bussinesses. The mean rating is approximately 4.195 comfirms the positive skew.
Boxplot will help visualize how quantitative variable
(rating) is distributed across different categories
(primaryType)
# identity top 10 types
top_types = clean_data %>%
count(primaryType, sort = TRUE) %>%
slice_head(n=10) %>%
pull(primaryType)
data_top_types = clean_data %>%
# convert primaryType to a factor for proper ordering in the plot
mutate(primaryType=factor(primaryType, levels = rev(top_types)))
# generate boxplot
ggplot(data_top_types, aes(x=rating, y=primaryType, fill=primaryType)) +
geom_boxplot() +
labs(
title = 'Rating Distribution Across Top 10 Restaurant Types',
x = 'Rating',
y = 'Primary Type'
)
# Ratings boxplot
boxplot(x=clean_data$rating, main="Ratings of Restaurants in Chicago", col="lightblue")
primaryType
vs. RatingWe are testing whether the population mean of rating are
equal across the different categories of restaurant
primaryType (or among the top 5 types).
\[ H_0: {\mu}_{type1} = {\mu}_{type2} = {\mu}_{type2} = ... = {\mu}_{type5} \]
\[ H_1: {\mu}_{type1} \ne {\mu}_{type2} \ne ... \ne {\mu}_{type5} \]
Test Method: One way Analysis of Variance (ANOVA). ANOVA test to check the association between one numerical variable and one categorical variable.
Significance level (\(\alpha\)): 0.05
# filter for top 5 primaryTypes
top_5_primeTypes = clean_data %>%
count(primaryType, sort = TRUE) %>%
slice_head(n=5) %>%
pull(primaryType)
top_5_anova_data = clean_data %>%
filter(primaryType %in% top_5_primeTypes) %>%
mutate(primaryType = factor(primaryType)) # Ensure the type is a factor
# ANOVA for top 5 primaryTypes
anova_5_results = aov(rating ~ primaryType, data=top_5_anova_data)
# view summary of top 5 primaryTypes
summary(anova_5_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## primaryType 4 82.5 20.620 89.21 <2e-16 ***
## Residuals 4200 970.8 0.231
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ANOVA for all primaryTypes
anova_results = aov(rating ~ primaryType, data=clean_data)
# View summary
summary(anova_results)
## Df Sum Sq Mean Sq F value Pr(>F)
## primaryType 55 294 5.345 25.43 <2e-16 ***
## Residuals 7322 1539 0.210
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion
- P-value Analysis: The calculated p-value is extremely small, for less than the significance level of \(\alpha=0.05\)
- Decision: Since the p-value is less than \(\alpha\), we reject the null hypotheis.
- We have statistical evidence to conclude that there is a difference in the mean customer rating among the top five primary restuarant types (including all types). In other words, the type of restaurant is a significant factor in predicting customer rating.
Delivery
vs. RatingWe are testing whether restaurants that offer delivery services have different average ratings compared to those that do not
Null Hypothesis: Restaurants with deliveries have the same average rating as those who don’t.
Alternative Hypothesis: Restaurants with deliveries have a higher average rating compared to those who don’t
Test Method: One-tailed t-test comparing means of two independent samples (delivery = TRUE vs. delivery = FALSE)
Significance level (\(\alpha\)): 0.05
# Check delivery value counts
table(clean_data$delivery)
##
## FALSE TRUE
## 541 6837
delivery_true = clean_data$rating[clean_data$delivery == TRUE]
delivery_false = clean_data$rating[clean_data$delivery == FALSE]
# One-tailed t-test: H1 = delivery has higher rating
t_result = t.test(delivery_true, delivery_false, alternative = "greater", var.equal = FALSE)
t_result
##
## Welch Two Sample t-test
##
## data: delivery_true and delivery_false
## t = 0.34527, df = 588.19, p-value = 0.365
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## -0.03726101 Inf
## sample estimates:
## mean of x mean of y
## 4.195832 4.185952
Conclusion
- P-value Analysis: The calculated p-value is
0.365, which is greater than the significance level of \(\alpha=0.05\)- Decision: Since the p-value is greater than \(\alpha\), we fail to reject the null hypotheis.
- There is insufficient statistical evidence to conclude that restaurants offering delivery services have higher average ratings compared to those that do not. In fact, restaurants that offer deliveries have a lower average rating of about 0.15
library(MLmetrics)
##
## Attaching package: 'MLmetrics'
## The following object is masked from 'package:base':
##
## Recall
library(MASS) # Run stepwise regression, MASS package required
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
library(caTools) # for stratified splitting
RestaurantTraining and RestaurantTest
respectively# Setting the seed fixes the randomness in the split for reproducibility
set.seed(42)
# Ensure primaryType is a factor
data_to_split = clean_data %>%
mutate(primaryType = as.factor(primaryType))
# perform stratified split on the primaryType column to maintain its distribution
split = sample.split(data_to_split$primaryType, SplitRatio = 0.8)
RestaurantTraining = subset(data_to_split, split == TRUE)
RestaurantTest = subset(data_to_split, split == FALSE)
# # Check new dimensions
# cat("Training set size:", nrow(RestaurantTraining), "\n")
# cat("Testing set size:", nrow(RestaurantTest), "\n")
# construct multiple linear regression model
mr_model = lm(rating ~ . , data = RestaurantTraining )
# Display summary of the model
# Residuals (The Error Distribution)
# - Median,"Should be close to zero. If the median is far from zero, it suggests the model is systematically biased (e.g., over- or under-predicting)."
# - Min & Max,Indicate the size of the largest errors. Extremely large positive or negative values suggest outliers or heteroscedasticity.
# - 1Q & 3Q,Show the spread of the middle 50% of the errors. These should be relatively symmetric around zero.
summary(mr_model)
##
## Call:
## lm(formula = rating ~ ., data = RestaurantTraining)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.06849 -0.19696 0.05923 0.27832 1.17792
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.506e+00 2.660e-01 16.940 < 2e-16 ***
## userRatingCount 3.113e-05 5.866e-06 5.306 1.16e-07 ***
## businessStatusCLOSED_TEMPORARILY 2.614e-01 1.345e-01 1.944 0.051957 .
## businessStatusOPERATIONAL 1.866e-01 1.302e-01 1.434 0.151653
## primaryTypeafghani_restaurant -3.271e-01 5.022e-01 -0.651 0.514878
## primaryTypeafrican_restaurant -2.960e-01 2.760e-01 -1.073 0.283516
## primaryTypeamerican_restaurant -4.058e-01 2.266e-01 -1.791 0.073326 .
## primaryTypeasian_restaurant -3.565e-01 2.348e-01 -1.519 0.128936
## primaryTypebagel_shop -4.768e-01 2.531e-01 -1.884 0.059679 .
## primaryTypebakery -1.970e-01 2.390e-01 -0.824 0.409995
## primaryTypebar -3.191e-01 2.281e-01 -1.399 0.161899
## primaryTypebar_and_grill -2.935e-01 2.294e-01 -1.279 0.200884
## primaryTypebarbecue_restaurant -6.084e-01 2.335e-01 -2.605 0.009203 **
## primaryTypebrazilian_restaurant -2.812e-01 3.195e-01 -0.880 0.378736
## primaryTypebreakfast_restaurant -3.291e-01 2.284e-01 -1.440 0.149800
## primaryTypebrunch_restaurant -3.475e-01 2.529e-01 -1.374 0.169469
## primaryTypebuffet_restaurant -6.461e-01 2.696e-01 -2.396 0.016600 *
## primaryTypecafe -4.589e-01 2.310e-01 -1.987 0.046993 *
## primaryTypecafeteria 2.670e-01 5.013e-01 0.533 0.594296
## primaryTypechinese_restaurant -6.010e-01 2.272e-01 -2.645 0.008189 **
## primaryTypecoffee_shop -6.989e-01 2.269e-01 -3.080 0.002080 **
## primaryTypedeli 2.915e-02 2.547e-01 0.114 0.908862
## primaryTypediner -3.329e-01 2.414e-01 -1.379 0.167866
## primaryTypedonut_shop -2.581e-01 2.576e-01 -1.002 0.316447
## primaryTypefast_food_restaurant -7.627e-01 2.282e-01 -3.342 0.000836 ***
## primaryTypefine_dining_restaurant -4.743e-01 3.898e-01 -1.217 0.223665
## primaryTypefood_court -3.954e-02 3.887e-01 -0.102 0.918982
## primaryTypefood_store -1.098e-01 3.015e-01 -0.364 0.715765
## primaryTypefrench_restaurant -2.362e-01 2.673e-01 -0.884 0.376796
## primaryTypegreek_restaurant -3.097e-01 2.446e-01 -1.266 0.205592
## primaryTypehamburger_restaurant -5.093e-01 2.325e-01 -2.190 0.028545 *
## primaryTypeindian_restaurant -4.151e-01 2.330e-01 -1.781 0.074894 .
## primaryTypeitalian_restaurant -2.866e-01 2.288e-01 -1.252 0.210439
## primaryTypejapanese_restaurant -2.903e-01 2.338e-01 -1.241 0.214535
## primaryTypejuice_shop -3.313e-01 2.396e-01 -1.382 0.166892
## primaryTypekorean_restaurant -2.199e-01 2.377e-01 -0.925 0.355011
## primaryTypelebanese_restaurant 1.318e-02 3.896e-01 0.034 0.973017
## primaryTypemeal_delivery -9.386e-01 2.338e-01 -4.015 6.03e-05 ***
## primaryTypemeal_takeaway -2.581e-01 2.405e-01 -1.073 0.283214
## primaryTypemediterranean_restaurant -2.436e-01 2.334e-01 -1.043 0.296828
## primaryTypemexican_restaurant -4.086e-01 2.260e-01 -1.808 0.070646 .
## primaryTypemiddle_eastern_restaurant -1.887e-01 2.367e-01 -0.797 0.425415
## primaryTypenight_club -4.618e-01 5.005e-01 -0.923 0.356171
## primaryTypepizza_restaurant -5.064e-01 2.266e-01 -2.235 0.025442 *
## primaryTypepub -2.438e-01 2.353e-01 -1.036 0.300048
## primaryTyperamen_restaurant -2.172e-01 2.418e-01 -0.898 0.369146
## primaryTyperestaurant -4.160e-01 2.255e-01 -1.845 0.065126 .
## primaryTypesandwich_shop -6.946e-01 2.267e-01 -3.064 0.002193 **
## primaryTypeseafood_restaurant -4.524e-01 2.312e-01 -1.957 0.050393 .
## primaryTypespanish_restaurant -3.263e-01 2.912e-01 -1.121 0.262527
## primaryTypesteak_house -3.588e-01 2.453e-01 -1.463 0.143579
## primaryTypesushi_restaurant -1.352e-01 2.325e-01 -0.581 0.560989
## primaryTypetea_house -1.239e-01 3.174e-01 -0.391 0.696168
## primaryTypethai_restaurant -2.609e-01 2.304e-01 -1.133 0.257418
## primaryTypeturkish_restaurant -5.449e-02 2.703e-01 -0.202 0.840264
## primaryTypevegan_restaurant -1.328e-01 2.455e-01 -0.541 0.588536
## primaryTypevegetarian_restaurant -5.087e-01 3.428e-01 -1.484 0.137967
## primaryTypevietnamese_restaurant -1.767e-01 2.378e-01 -0.743 0.457588
## primaryTypewine_bar -2.711e-01 3.009e-01 -0.901 0.367705
## takeoutTRUE -3.823e-02 4.660e-02 -0.821 0.411940
## deliveryTRUE 8.497e-02 2.398e-02 3.544 0.000398 ***
## dineInTRUE 3.477e-02 2.572e-02 1.352 0.176469
## curbsidePickupTRUE 6.097e-02 1.224e-02 4.980 6.56e-07 ***
## reservableTRUE -2.400e-03 1.787e-02 -0.134 0.893127
## servesLunchTRUE -2.746e-02 2.956e-02 -0.929 0.352911
## servesDinnerTRUE -1.184e-01 2.657e-02 -4.457 8.46e-06 ***
## servesBeerTRUE 4.065e-02 3.238e-02 1.255 0.209461
## servesWineTRUE 6.476e-02 2.939e-02 2.204 0.027572 *
## liveMusicTRUE -1.809e-02 2.233e-02 -0.810 0.418006
## servesCocktailsTRUE -8.048e-02 2.656e-02 -3.031 0.002451 **
## goodForChildrenTRUE 1.389e-01 1.842e-02 7.540 5.43e-14 ***
## acceptsCreditCardsTRUE 9.165e-02 2.831e-02 3.237 0.001215 **
## acceptsDebitCardsTRUE -7.960e-02 3.883e-02 -2.050 0.040385 *
## acceptsCashOnlyTRUE -1.496e-01 5.334e-02 -2.804 0.005062 **
## acceptsNfcTRUE -5.688e-02 1.935e-02 -2.940 0.003299 **
## freeParkingLotTRUE -7.612e-02 1.381e-02 -5.512 3.69e-08 ***
## freeStreetParkingTRUE 1.059e-01 1.357e-02 7.807 6.92e-15 ***
## wheelchairAccessibleEntranceTRUE -9.515e-02 1.760e-02 -5.407 6.67e-08 ***
## wheelchairAccessibleRestroomTRUE -2.438e-02 1.622e-02 -1.503 0.132918
## wheelchairAccessibleSeatingTRUE 3.507e-02 1.716e-02 2.044 0.041025 *
## priceStartUSD 3.368e-03 3.006e-03 1.120 0.262551
## priceEndUSD 1.294e-03 1.684e-03 0.768 0.442288
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.447 on 5819 degrees of freedom
## Multiple R-squared: 0.2088, Adjusted R-squared: 0.1978
## F-statistic: 18.96 on 81 and 5819 DF, p-value: < 2.2e-16
Conclusion
The following predictors appear to have a statistically significant relationship to the response variable (p-value < 0.05):
userRatingCount,deliveryTRUE,curbsidePickupTRUE,servesDinnerTRUE,servesWineTRUE,servesCocktailsTRUE,goodForChildrenTRUE,acceptsCreditCardsTRUE,acceptsDebitCardsTRUE,acceptsCashOnlyTRUE,acceptsNfcTRUE,freeParkingLotTRUE,freeStreetParkingTRUE,wheelchairAccessibleEntranceTRUE,wheelchairAccessibleSeatingTRUE, and several levels of the categorical variableprimaryTypebarbecue_restaurant,primaryTypebuffet_restaurantprimaryTypecafeprimaryTypechinese_restaurantprimaryTypecoffee_shopprimaryTypefast_food_restaurantprimaryTypehamburger_restaurantprimaryTypemeal_deliveryprimaryTypepizza_restaurantprimaryTypesandwich_shop.Residual Standard Error (RSE) is
0.447on5819degrees of freedom.Multiple R-squared (\(R^2\)) is
0.2088The model explains20.88%of the variance inrating.Adjusted R-squared (\(R^2\)) is
0.1978. This is a better measure for comparison, as it penalizes models for including irrelevant variables. If the Adjusted \(R^2\) is much lower than \(R^2\), it confirms that many of your predictors (likely the numerousprimaryTypedummy variables) are not useful and the model is slightly overfit.
rating in
RestaurantTest and calculate MAE and
MSE.# Predicting rating
ypred = predict(object=mr_model, newdata = RestaurantTest)
# Mean Absolute Error of predicted rating, and Actual rating
# MAE measures the average magnitude of errors in a set of predictions, without considering their direction (it ignores positive or negative signs). It is the simplest and most intuitive metric because it is in the same units as the target variable
# On average, your model's predicted rating is off by $0.318$ rating points from the actual customer rating.
MAE(y_pred = ypred , y_true = RestaurantTest$rating)
## [1] 0.3188165
# # Mean Square Error of predicted rating, and Actual rating
# MSE measures the average squared difference between the predicted and actual values. By squaring the errors, $\text{MSE}$ penalizes large errors much more heavily than $\text{MAE}$
MSE(y_pred = ypred, y_true = RestaurantTest$rating)
## [1] 0.1849335
# create a null model / intercept only mode
null_model = lm(rating ~ 1, data = RestaurantTraining)
# create a full model
full_model = lm(rating~. , data = RestaurantTraining)
# perform step-wise selection using stepAIC()
step_forward = stepAIC(null_model, direction='forward', scope=formula(full_model))
## Start: AIC=-8201.43
## rating ~ 1
##
## Df Sum of Sq RSS AIC
## + primaryType 55 223.312 1246.2 -9064.1
## + priceStartUSD 1 77.820 1391.7 -8520.5
## + priceEndUSD 1 60.760 1408.8 -8448.6
## + freeParkingLot 1 28.461 1441.1 -8314.8
## + acceptsNfc 1 14.890 1454.6 -8259.5
## + servesLunch 1 13.929 1455.6 -8255.6
## + servesDinner 1 13.087 1456.5 -8252.2
## + goodForChildren 1 12.015 1457.5 -8247.9
## + acceptsDebitCards 1 11.077 1458.5 -8244.1
## + wheelchairAccessibleEntrance 1 11.067 1458.5 -8244.0
## + acceptsCashOnly 1 8.889 1460.6 -8235.2
## + wheelchairAccessibleSeating 1 7.905 1461.6 -8231.3
## + servesCocktails 1 7.168 1462.4 -8228.3
## + liveMusic 1 6.969 1462.6 -8227.5
## + reservable 1 5.223 1464.3 -8220.4
## + userRatingCount 1 4.934 1464.6 -8219.3
## + dineIn 1 2.282 1467.2 -8208.6
## + acceptsCreditCards 1 2.241 1467.3 -8208.4
## + takeout 1 1.926 1467.6 -8207.2
## + servesWine 1 1.770 1467.8 -8206.5
## + curbsidePickup 1 1.654 1467.9 -8206.1
## + servesBeer 1 1.344 1468.2 -8204.8
## + freeStreetParking 1 1.006 1468.5 -8203.5
## + wheelchairAccessibleRestroom 1 0.771 1468.8 -8202.5
## <none> 1469.5 -8201.4
## + delivery 1 0.439 1469.1 -8201.2
## + businessStatus 2 0.934 1468.6 -8201.2
##
## Step: AIC=-9064.09
## rating ~ primaryType
##
## Df Sum of Sq RSS AIC
## + priceStartUSD 1 13.1952 1233.0 -9124.9
## + priceEndUSD 1 12.2480 1234.0 -9120.4
## + goodForChildren 1 7.3870 1238.8 -9097.2
## + curbsidePickup 1 6.4304 1239.8 -9092.6
## + wheelchairAccessibleEntrance 1 5.9788 1240.2 -9090.5
## + servesDinner 1 5.9537 1240.3 -9090.3
## + acceptsCashOnly 1 5.6555 1240.6 -9088.9
## + acceptsDebitCards 1 5.5139 1240.7 -9088.3
## + acceptsNfc 1 5.1755 1241.0 -9086.6
## + userRatingCount 1 4.7609 1241.5 -9084.7
## + freeStreetParking 1 4.2582 1242.0 -9082.3
## + freeParkingLot 1 4.2314 1242.0 -9082.2
## + servesLunch 1 4.2030 1242.0 -9082.0
## + delivery 1 1.8330 1244.4 -9070.8
## + liveMusic 1 1.7038 1244.5 -9070.2
## + servesWine 1 1.2065 1245.0 -9067.8
## + dineIn 1 0.7416 1245.5 -9065.6
## + takeout 1 0.6993 1245.5 -9065.4
## + servesBeer 1 0.5761 1245.7 -9064.8
## <none> 1246.2 -9064.1
## + businessStatus 2 0.7759 1245.5 -9063.8
## + servesCocktails 1 0.2401 1246.0 -9063.2
## + wheelchairAccessibleSeating 1 0.2313 1246.0 -9063.2
## + acceptsCreditCards 1 0.1199 1246.1 -9062.7
## + reservable 1 0.0296 1246.2 -9062.2
## + wheelchairAccessibleRestroom 1 0.0281 1246.2 -9062.2
##
## Step: AIC=-9124.9
## rating ~ primaryType + priceStartUSD
##
## Df Sum of Sq RSS AIC
## + wheelchairAccessibleEntrance 1 7.8771 1225.2 -9160.7
## + goodForChildren 1 7.1272 1225.9 -9157.1
## + servesDinner 1 7.1013 1225.9 -9157.0
## + acceptsDebitCards 1 6.9493 1226.1 -9156.3
## + acceptsCashOnly 1 6.3733 1226.7 -9153.5
## + acceptsNfc 1 6.1603 1226.9 -9152.5
## + curbsidePickup 1 5.6498 1227.4 -9150.0
## + freeStreetParking 1 5.4927 1227.5 -9149.2
## + servesLunch 1 3.1223 1229.9 -9137.9
## + userRatingCount 1 2.9930 1230.0 -9137.2
## + freeParkingLot 1 2.4348 1230.6 -9134.6
## + delivery 1 1.7627 1231.3 -9131.3
## + liveMusic 1 1.2735 1231.8 -9129.0
## + servesCocktails 1 0.7240 1232.3 -9126.4
## + wheelchairAccessibleRestroom 1 0.7018 1232.3 -9126.3
## + businessStatus 2 0.8620 1232.2 -9125.0
## + servesWine 1 0.4280 1232.6 -9125.0
## <none> 1233.0 -9124.9
## + reservable 1 0.3747 1232.7 -9124.7
## + takeout 1 0.3236 1232.7 -9124.5
## + dineIn 1 0.2651 1232.8 -9124.2
## + servesBeer 1 0.1232 1232.9 -9123.5
## + acceptsCreditCards 1 0.0065 1233.0 -9122.9
## + wheelchairAccessibleSeating 1 0.0038 1233.0 -9122.9
## + priceEndUSD 1 0.0002 1233.0 -9122.9
##
## Step: AIC=-9160.72
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance
##
## Df Sum of Sq RSS AIC
## + goodForChildren 1 11.9171 1213.2 -9216.4
## + freeStreetParking 1 7.8357 1217.3 -9196.6
## + curbsidePickup 1 5.5520 1219.6 -9185.5
## + userRatingCount 1 5.2887 1219.9 -9184.3
## + servesDinner 1 4.3291 1220.8 -9179.6
## + acceptsDebitCards 1 3.5097 1221.6 -9175.7
## + acceptsCashOnly 1 3.2901 1221.9 -9174.6
## + delivery 1 2.9768 1222.2 -9173.1
## + acceptsNfc 1 2.8353 1222.3 -9172.4
## + servesWine 1 1.9664 1223.2 -9168.2
## + wheelchairAccessibleSeating 1 1.8893 1223.3 -9167.8
## + servesLunch 1 1.4940 1223.7 -9165.9
## + servesBeer 1 1.3280 1223.8 -9165.1
## + dineIn 1 1.1358 1224.0 -9164.2
## + freeParkingLot 1 1.1306 1224.0 -9164.2
## <none> 1225.2 -9160.7
## + wheelchairAccessibleRestroom 1 0.3172 1224.8 -9160.2
## + acceptsCreditCards 1 0.3139 1224.8 -9160.2
## + liveMusic 1 0.2999 1224.8 -9160.2
## + businessStatus 2 0.6180 1224.5 -9159.7
## + takeout 1 0.1055 1225.0 -9159.2
## + reservable 1 0.0394 1225.1 -9158.9
## + servesCocktails 1 0.0182 1225.1 -9158.8
## + priceEndUSD 1 0.0165 1225.1 -9158.8
##
## Step: AIC=-9216.4
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren
##
## Df Sum of Sq RSS AIC
## + servesDinner 1 8.5151 1204.7 -9256.0
## + acceptsCashOnly 1 7.4746 1205.8 -9250.9
## + freeStreetParking 1 6.5156 1206.7 -9246.2
## + acceptsDebitCards 1 6.2485 1207.0 -9244.9
## + curbsidePickup 1 5.9500 1207.3 -9243.4
## + userRatingCount 1 4.4390 1208.8 -9236.0
## + acceptsNfc 1 4.2570 1209.0 -9235.1
## + servesLunch 1 3.9060 1209.3 -9233.4
## + delivery 1 2.2192 1211.0 -9225.2
## + liveMusic 1 2.1014 1211.1 -9224.6
## + freeParkingLot 1 1.3494 1211.9 -9221.0
## + wheelchairAccessibleSeating 1 0.6120 1212.6 -9217.4
## + servesWine 1 0.5057 1212.7 -9216.9
## <none> 1213.2 -9216.4
## + takeout 1 0.3972 1212.8 -9216.3
## + servesCocktails 1 0.3788 1212.8 -9216.2
## + businessStatus 2 0.7321 1212.5 -9216.0
## + dineIn 1 0.2443 1213.0 -9215.6
## + acceptsCreditCards 1 0.1928 1213.0 -9215.3
## + servesBeer 1 0.1782 1213.0 -9215.3
## + reservable 1 0.1266 1213.1 -9215.0
## + wheelchairAccessibleRestroom 1 0.0138 1213.2 -9214.5
## + priceEndUSD 1 0.0074 1213.2 -9214.4
##
## Step: AIC=-9255.96
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner
##
## Df Sum of Sq RSS AIC
## + freeStreetParking 1 7.7245 1197.0 -9291.9
## + curbsidePickup 1 6.3059 1198.4 -9284.9
## + userRatingCount 1 5.4326 1199.3 -9280.6
## + acceptsCashOnly 1 3.3826 1201.3 -9270.6
## + delivery 1 3.1954 1201.5 -9269.6
## + acceptsDebitCards 1 3.1035 1201.6 -9269.2
## + acceptsNfc 1 2.7077 1202.0 -9267.2
## + wheelchairAccessibleSeating 1 1.4288 1203.3 -9261.0
## + servesWine 1 1.3903 1203.3 -9260.8
## + freeParkingLot 1 0.8805 1203.8 -9258.3
## + servesBeer 1 0.8702 1203.8 -9258.2
## + liveMusic 1 0.6821 1204.0 -9257.3
## + servesLunch 1 0.6305 1204.1 -9257.1
## <none> 1204.7 -9256.0
## + dineIn 1 0.3535 1204.4 -9255.7
## + businessStatus 2 0.5703 1204.2 -9254.8
## + reservable 1 0.0518 1204.7 -9254.2
## + acceptsCreditCards 1 0.0462 1204.7 -9254.2
## + takeout 1 0.0280 1204.7 -9254.1
## + priceEndUSD 1 0.0173 1204.7 -9254.0
## + wheelchairAccessibleRestroom 1 0.0126 1204.7 -9254.0
## + servesCocktails 1 0.0107 1204.7 -9254.0
##
## Step: AIC=-9291.92
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking
##
## Df Sum of Sq RSS AIC
## + curbsidePickup 1 5.8111 1191.2 -9318.6
## + userRatingCount 1 5.1661 1191.8 -9315.4
## + freeParkingLot 1 5.1259 1191.9 -9315.2
## + acceptsCashOnly 1 4.0710 1192.9 -9310.0
## + acceptsDebitCards 1 3.6984 1193.3 -9308.2
## + acceptsNfc 1 3.2275 1193.8 -9305.9
## + delivery 1 2.7626 1194.2 -9303.6
## + servesWine 1 0.9353 1196.1 -9294.5
## + wheelchairAccessibleSeating 1 0.8050 1196.2 -9293.9
## + liveMusic 1 0.7626 1196.2 -9293.7
## + servesLunch 1 0.7517 1196.2 -9293.6
## + servesBeer 1 0.4837 1196.5 -9292.3
## <none> 1197.0 -9291.9
## + dineIn 1 0.3100 1196.7 -9291.5
## + businessStatus 2 0.6851 1196.3 -9291.3
## + servesCocktails 1 0.1481 1196.8 -9290.7
## + takeout 1 0.0840 1196.9 -9290.3
## + priceEndUSD 1 0.0101 1197.0 -9290.0
## + wheelchairAccessibleRestroom 1 0.0080 1197.0 -9290.0
## + acceptsCreditCards 1 0.0046 1197.0 -9289.9
## + reservable 1 0.0017 1197.0 -9289.9
##
## Step: AIC=-9318.64
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup
##
## Df Sum of Sq RSS AIC
## + freeParkingLot 1 5.8030 1185.4 -9345.5
## + userRatingCount 1 4.6745 1186.5 -9339.8
## + acceptsCashOnly 1 3.9956 1187.2 -9336.5
## + acceptsDebitCards 1 3.8173 1187.4 -9335.6
## + acceptsNfc 1 3.1479 1188.0 -9332.3
## + delivery 1 1.7966 1189.4 -9325.5
## + servesWine 1 1.2005 1190.0 -9322.6
## + wheelchairAccessibleSeating 1 0.7459 1190.4 -9320.3
## + servesBeer 1 0.7343 1190.5 -9320.3
## + servesLunch 1 0.6636 1190.5 -9319.9
## + liveMusic 1 0.5247 1190.7 -9319.2
## <none> 1191.2 -9318.6
## + businessStatus 2 0.6784 1190.5 -9318.0
## + dineIn 1 0.1829 1191.0 -9317.5
## + takeout 1 0.1540 1191.0 -9317.4
## + servesCocktails 1 0.0947 1191.1 -9317.1
## + acceptsCreditCards 1 0.0171 1191.2 -9316.7
## + priceEndUSD 1 0.0112 1191.2 -9316.7
## + wheelchairAccessibleRestroom 1 0.0069 1191.2 -9316.7
## + reservable 1 0.0055 1191.2 -9316.7
##
## Step: AIC=-9345.46
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot
##
## Df Sum of Sq RSS AIC
## + userRatingCount 1 4.8170 1180.6 -9367.5
## + acceptsCashOnly 1 4.2835 1181.1 -9364.8
## + acceptsDebitCards 1 3.9070 1181.5 -9362.9
## + acceptsNfc 1 3.0568 1182.3 -9358.7
## + delivery 1 1.8491 1183.5 -9352.7
## + servesWine 1 1.2140 1184.2 -9349.5
## + wheelchairAccessibleSeating 1 0.8423 1184.5 -9347.7
## + servesBeer 1 0.7574 1184.6 -9347.2
## + servesLunch 1 0.6590 1184.7 -9346.7
## + liveMusic 1 0.5761 1184.8 -9346.3
## <none> 1185.4 -9345.5
## + businessStatus 2 0.7579 1184.6 -9345.2
## + dineIn 1 0.1705 1185.2 -9344.3
## + takeout 1 0.1492 1185.2 -9344.2
## + servesCocktails 1 0.0682 1185.3 -9343.8
## + acceptsCreditCards 1 0.0236 1185.3 -9343.6
## + wheelchairAccessibleRestroom 1 0.0113 1185.4 -9343.5
## + reservable 1 0.0103 1185.4 -9343.5
## + priceEndUSD 1 0.0000 1185.4 -9343.5
##
## Step: AIC=-9367.49
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount
##
## Df Sum of Sq RSS AIC
## + acceptsCashOnly 1 4.3537 1176.2 -9387.3
## + acceptsDebitCards 1 4.0675 1176.5 -9385.9
## + acceptsNfc 1 3.8273 1176.7 -9384.6
## + delivery 1 1.6968 1178.9 -9374.0
## + servesWine 1 0.9395 1179.6 -9370.2
## + servesLunch 1 0.8764 1179.7 -9369.9
## + liveMusic 1 0.6059 1180.0 -9368.5
## + servesBeer 1 0.5401 1180.0 -9368.2
## + businessStatus 2 0.8085 1179.8 -9367.5
## <none> 1180.6 -9367.5
## + wheelchairAccessibleSeating 1 0.3796 1180.2 -9367.4
## + servesCocktails 1 0.2157 1180.3 -9366.6
## + takeout 1 0.2145 1180.3 -9366.6
## + dineIn 1 0.1366 1180.4 -9366.2
## + acceptsCreditCards 1 0.0537 1180.5 -9365.8
## + wheelchairAccessibleRestroom 1 0.0209 1180.5 -9365.6
## + reservable 1 0.0176 1180.5 -9365.6
## + priceEndUSD 1 0.0091 1180.5 -9365.5
##
## Step: AIC=-9387.29
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly
##
## Df Sum of Sq RSS AIC
## + delivery 1 2.14160 1174.1 -9396.0
## + acceptsNfc 1 2.01750 1174.2 -9395.4
## + acceptsCreditCards 1 1.84518 1174.4 -9394.6
## + servesWine 1 1.42153 1174.8 -9392.4
## + servesBeer 1 0.93849 1175.3 -9390.0
## + acceptsDebitCards 1 0.83497 1175.4 -9389.5
## + wheelchairAccessibleSeating 1 0.43651 1175.8 -9387.5
## + businessStatus 2 0.83462 1175.4 -9387.5
## <none> 1176.2 -9387.3
## + servesLunch 1 0.27154 1175.9 -9386.7
## + dineIn 1 0.21008 1176.0 -9386.3
## + liveMusic 1 0.10768 1176.1 -9385.8
## + servesCocktails 1 0.05643 1176.2 -9385.6
## + wheelchairAccessibleRestroom 1 0.02987 1176.2 -9385.4
## + takeout 1 0.01529 1176.2 -9385.4
## + priceEndUSD 1 0.00575 1176.2 -9385.3
## + reservable 1 0.00016 1176.2 -9385.3
##
## Step: AIC=-9396.04
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery
##
## Df Sum of Sq RSS AIC
## + acceptsNfc 1 2.50451 1171.6 -9406.6
## + acceptsCreditCards 1 1.62176 1172.4 -9402.2
## + servesWine 1 1.29249 1172.8 -9400.5
## + acceptsDebitCards 1 1.13794 1172.9 -9399.8
## + servesBeer 1 0.85615 1173.2 -9398.3
## + businessStatus 2 0.80600 1173.3 -9396.1
## <none> 1174.1 -9396.0
## + wheelchairAccessibleSeating 1 0.39140 1173.7 -9396.0
## + servesLunch 1 0.28208 1173.8 -9395.5
## + dineIn 1 0.15423 1173.9 -9394.8
## + liveMusic 1 0.11159 1174.0 -9394.6
## + servesCocktails 1 0.08680 1174.0 -9394.5
## + takeout 1 0.08080 1174.0 -9394.4
## + wheelchairAccessibleRestroom 1 0.04735 1174.0 -9394.3
## + reservable 1 0.01147 1174.0 -9394.1
## + priceEndUSD 1 0.00895 1174.1 -9394.1
##
## Step: AIC=-9406.64
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc
##
## Df Sum of Sq RSS AIC
## + acceptsCreditCards 1 2.01908 1169.5 -9414.8
## + servesWine 1 1.68301 1169.9 -9413.1
## + servesBeer 1 1.17121 1170.4 -9410.5
## + wheelchairAccessibleSeating 1 0.54850 1171.0 -9407.4
## <none> 1171.6 -9406.6
## + businessStatus 2 0.78309 1170.8 -9406.6
## + acceptsDebitCards 1 0.32557 1171.2 -9406.3
## + servesLunch 1 0.21456 1171.3 -9405.7
## + dineIn 1 0.16382 1171.4 -9405.5
## + liveMusic 1 0.09438 1171.5 -9405.1
## + takeout 1 0.07136 1171.5 -9405.0
## + wheelchairAccessibleRestroom 1 0.01650 1171.5 -9404.7
## + reservable 1 0.01603 1171.5 -9404.7
## + servesCocktails 1 0.01463 1171.5 -9404.7
## + priceEndUSD 1 0.01043 1171.5 -9404.7
##
## Step: AIC=-9414.82
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards
##
## Df Sum of Sq RSS AIC
## + servesWine 1 1.13150 1168.4 -9418.5
## + acceptsDebitCards 1 0.85783 1168.7 -9417.2
## + businessStatus 2 1.23411 1168.3 -9417.1
## + servesBeer 1 0.67700 1168.9 -9416.2
## + wheelchairAccessibleSeating 1 0.48838 1169.0 -9415.3
## <none> 1169.5 -9414.8
## + servesLunch 1 0.23853 1169.3 -9414.0
## + liveMusic 1 0.19779 1169.3 -9413.8
## + dineIn 1 0.13657 1169.4 -9413.5
## + priceEndUSD 1 0.09352 1169.5 -9413.3
## + wheelchairAccessibleRestroom 1 0.08855 1169.5 -9413.3
## + takeout 1 0.07445 1169.5 -9413.2
## + servesCocktails 1 0.04232 1169.5 -9413.0
## + reservable 1 0.01516 1169.5 -9412.9
##
## Step: AIC=-9418.53
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards + servesWine
##
## Df Sum of Sq RSS AIC
## + servesCocktails 1 1.67620 1166.7 -9425.0
## + businessStatus 2 1.27228 1167.1 -9421.0
## + acceptsDebitCards 1 0.74135 1167.7 -9420.3
## <none> 1168.4 -9418.5
## + wheelchairAccessibleSeating 1 0.34625 1168.1 -9418.3
## + liveMusic 1 0.32111 1168.1 -9418.2
## + servesLunch 1 0.26972 1168.1 -9417.9
## + dineIn 1 0.19287 1168.2 -9417.5
## + wheelchairAccessibleRestroom 1 0.12605 1168.3 -9417.2
## + priceEndUSD 1 0.10302 1168.3 -9417.1
## + takeout 1 0.08629 1168.3 -9417.0
## + reservable 1 0.01921 1168.4 -9416.6
## + servesBeer 1 0.00133 1168.4 -9416.5
##
## Step: AIC=-9425.01
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards + servesWine + servesCocktails
##
## Df Sum of Sq RSS AIC
## + businessStatus 2 1.16481 1165.6 -9426.9
## + acceptsDebitCards 1 0.72056 1166.0 -9426.7
## + wheelchairAccessibleSeating 1 0.41604 1166.3 -9425.1
## <none> 1166.7 -9425.0
## + servesBeer 1 0.27919 1166.5 -9424.4
## + servesLunch 1 0.23816 1166.5 -9424.2
## + liveMusic 1 0.23198 1166.5 -9424.2
## + dineIn 1 0.18747 1166.5 -9424.0
## + takeout 1 0.11841 1166.6 -9423.6
## + priceEndUSD 1 0.11685 1166.6 -9423.6
## + wheelchairAccessibleRestroom 1 0.10638 1166.6 -9423.5
## + reservable 1 0.00000 1166.7 -9423.0
##
## Step: AIC=-9426.9
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards + servesWine + servesCocktails +
## businessStatus
##
## Df Sum of Sq RSS AIC
## + acceptsDebitCards 1 0.78007 1164.8 -9428.9
## + wheelchairAccessibleSeating 1 0.48255 1165.1 -9427.3
## <none> 1165.6 -9426.9
## + servesBeer 1 0.28819 1165.3 -9426.4
## + dineIn 1 0.23162 1165.3 -9426.1
## + servesLunch 1 0.20987 1165.4 -9426.0
## + liveMusic 1 0.18368 1165.4 -9425.8
## + priceEndUSD 1 0.12254 1165.5 -9425.5
## + takeout 1 0.11753 1165.5 -9425.5
## + wheelchairAccessibleRestroom 1 0.09447 1165.5 -9425.4
## + reservable 1 0.00020 1165.6 -9424.9
##
## Step: AIC=-9428.85
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards + servesWine + servesCocktails +
## businessStatus + acceptsDebitCards
##
## Df Sum of Sq RSS AIC
## + wheelchairAccessibleSeating 1 0.53076 1164.3 -9429.5
## <none> 1164.8 -9428.9
## + servesBeer 1 0.27626 1164.5 -9428.3
## + dineIn 1 0.25126 1164.5 -9428.1
## + servesLunch 1 0.20096 1164.6 -9427.9
## + liveMusic 1 0.19087 1164.6 -9427.8
## + takeout 1 0.11642 1164.7 -9427.4
## + priceEndUSD 1 0.11292 1164.7 -9427.4
## + wheelchairAccessibleRestroom 1 0.09205 1164.7 -9427.3
## + reservable 1 0.00140 1164.8 -9426.9
##
## Step: AIC=-9429.54
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards + servesWine + servesCocktails +
## businessStatus + acceptsDebitCards + wheelchairAccessibleSeating
##
## Df Sum of Sq RSS AIC
## <none> 1164.3 -9429.5
## + wheelchairAccessibleRestroom 1 0.36116 1163.9 -9429.4
## + servesBeer 1 0.25154 1164.0 -9428.8
## + servesLunch 1 0.23140 1164.0 -9428.7
## + dineIn 1 0.21964 1164.0 -9428.7
## + liveMusic 1 0.19266 1164.1 -9428.5
## + priceEndUSD 1 0.14054 1164.1 -9428.3
## + takeout 1 0.11530 1164.1 -9428.1
## + reservable 1 0.01891 1164.2 -9427.6
step_forward$anova
## Stepwise Model Path
## Analysis of Deviance Table
##
## Initial Model:
## rating ~ 1
##
## Final Model:
## rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards + servesWine + servesCocktails +
## businessStatus + acceptsDebitCards + wheelchairAccessibleSeating
##
##
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 5900 1469.533 -8201.435
## 2 + primaryType 55 223.3117353 5845 1246.221 -9064.089
## 3 + priceStartUSD 1 13.1952052 5844 1233.026 -9124.902
## 4 + wheelchairAccessibleEntrance 1 7.8771187 5843 1225.149 -9160.722
## 5 + goodForChildren 1 11.9170991 5842 1213.232 -9216.402
## 6 + servesDinner 1 8.5150768 5841 1204.717 -9255.964
## 7 + freeStreetParking 1 7.7244907 5840 1196.992 -9291.922
## 8 + curbsidePickup 1 5.8111151 5839 1191.181 -9318.640
## 9 + freeParkingLot 1 5.8029831 5838 1185.378 -9345.458
## 10 + userRatingCount 1 4.8169676 5837 1180.561 -9367.486
## 11 + acceptsCashOnly 1 4.3537149 5836 1176.208 -9387.288
## 12 + delivery 1 2.1415976 5835 1174.066 -9396.043
## 13 + acceptsNfc 1 2.5045054 5834 1171.562 -9406.644
## 14 + acceptsCreditCards 1 2.0190782 5833 1169.543 -9414.823
## 15 + servesWine 1 1.1315030 5832 1168.411 -9418.534
## 16 + servesCocktails 1 1.6762050 5831 1166.735 -9425.006
## 17 + businessStatus 2 1.1648069 5829 1165.570 -9426.900
## 18 + acceptsDebitCards 1 0.7800684 5828 1164.790 -9428.851
## 19 + wheelchairAccessibleSeating 1 0.5307645 5827 1164.259 -9429.540
summary(step_forward)
##
## Call:
## lm(formula = rating ~ primaryType + priceStartUSD + wheelchairAccessibleEntrance +
## goodForChildren + servesDinner + freeStreetParking + curbsidePickup +
## freeParkingLot + userRatingCount + acceptsCashOnly + delivery +
## acceptsNfc + acceptsCreditCards + servesWine + servesCocktails +
## businessStatus + acceptsDebitCards + wheelchairAccessibleSeating,
## data = RestaurantTraining)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0882 -0.1989 0.0631 0.2787 1.1542
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.494e+00 2.625e-01 17.117 < 2e-16 ***
## primaryTypeafghani_restaurant -3.412e-01 5.019e-01 -0.680 0.496628
## primaryTypeafrican_restaurant -2.965e-01 2.753e-01 -1.077 0.281500
## primaryTypeamerican_restaurant -4.084e-01 2.264e-01 -1.804 0.071286 .
## primaryTypeasian_restaurant -3.614e-01 2.345e-01 -1.541 0.123379
## primaryTypebagel_shop -4.752e-01 2.526e-01 -1.882 0.059943 .
## primaryTypebakery -1.873e-01 2.383e-01 -0.786 0.431968
## primaryTypebar -3.160e-01 2.276e-01 -1.389 0.164999
## primaryTypebar_and_grill -2.945e-01 2.291e-01 -1.285 0.198682
## primaryTypebarbecue_restaurant -6.135e-01 2.333e-01 -2.630 0.008572 **
## primaryTypebrazilian_restaurant -2.874e-01 3.192e-01 -0.900 0.367927
## primaryTypebreakfast_restaurant -3.331e-01 2.281e-01 -1.461 0.144176
## primaryTypebrunch_restaurant -3.640e-01 2.524e-01 -1.442 0.149335
## primaryTypebuffet_restaurant -6.460e-01 2.695e-01 -2.397 0.016557 *
## primaryTypecafe -4.636e-01 2.306e-01 -2.010 0.044448 *
## primaryTypecafeteria 2.725e-01 5.005e-01 0.544 0.586172
## primaryTypechinese_restaurant -6.036e-01 2.270e-01 -2.659 0.007856 **
## primaryTypecoffee_shop -6.945e-01 2.264e-01 -3.068 0.002167 **
## primaryTypedeli 3.070e-02 2.544e-01 0.121 0.903939
## primaryTypediner -3.350e-01 2.411e-01 -1.389 0.164818
## primaryTypedonut_shop -2.390e-01 2.568e-01 -0.931 0.352027
## primaryTypefast_food_restaurant -7.408e-01 2.269e-01 -3.265 0.001101 **
## primaryTypefine_dining_restaurant -4.413e-01 3.888e-01 -1.135 0.256330
## primaryTypefood_court -3.344e-02 3.878e-01 -0.086 0.931281
## primaryTypefood_store -9.907e-02 3.008e-01 -0.329 0.741878
## primaryTypefrench_restaurant -2.185e-01 2.669e-01 -0.819 0.413038
## primaryTypegreek_restaurant -3.113e-01 2.444e-01 -1.274 0.202862
## primaryTypehamburger_restaurant -5.091e-01 2.323e-01 -2.192 0.028450 *
## primaryTypeindian_restaurant -4.196e-01 2.327e-01 -1.803 0.071443 .
## primaryTypeitalian_restaurant -2.915e-01 2.286e-01 -1.275 0.202399
## primaryTypejapanese_restaurant -2.942e-01 2.336e-01 -1.259 0.207950
## primaryTypejuice_shop -3.197e-01 2.387e-01 -1.339 0.180505
## primaryTypekorean_restaurant -2.172e-01 2.376e-01 -0.914 0.360550
## primaryTypelebanese_restaurant 1.740e-02 3.893e-01 0.045 0.964358
## primaryTypemeal_delivery -9.287e-01 2.333e-01 -3.981 6.95e-05 ***
## primaryTypemeal_takeaway -2.603e-01 2.403e-01 -1.083 0.278708
## primaryTypemediterranean_restaurant -2.420e-01 2.333e-01 -1.037 0.299630
## primaryTypemexican_restaurant -4.065e-01 2.257e-01 -1.801 0.071815 .
## primaryTypemiddle_eastern_restaurant -1.891e-01 2.364e-01 -0.800 0.423790
## primaryTypenight_club -4.536e-01 5.001e-01 -0.907 0.364475
## primaryTypepizza_restaurant -5.063e-01 2.264e-01 -2.236 0.025385 *
## primaryTypepub -2.426e-01 2.347e-01 -1.034 0.301209
## primaryTyperamen_restaurant -2.203e-01 2.415e-01 -0.912 0.361756
## primaryTyperestaurant -4.177e-01 2.253e-01 -1.854 0.063816 .
## primaryTypesandwich_shop -6.977e-01 2.264e-01 -3.081 0.002072 **
## primaryTypeseafood_restaurant -4.600e-01 2.310e-01 -1.991 0.046502 *
## primaryTypespanish_restaurant -3.158e-01 2.908e-01 -1.086 0.277446
## primaryTypesteak_house -3.549e-01 2.451e-01 -1.448 0.147662
## primaryTypesushi_restaurant -1.443e-01 2.321e-01 -0.622 0.534263
## primaryTypetea_house -1.208e-01 3.170e-01 -0.381 0.703166
## primaryTypethai_restaurant -2.643e-01 2.301e-01 -1.149 0.250687
## primaryTypeturkish_restaurant -5.864e-02 2.701e-01 -0.217 0.828136
## primaryTypevegan_restaurant -1.334e-01 2.453e-01 -0.544 0.586612
## primaryTypevegetarian_restaurant -5.136e-01 3.426e-01 -1.499 0.133945
## primaryTypevietnamese_restaurant -1.776e-01 2.376e-01 -0.747 0.454799
## primaryTypewine_bar -2.846e-01 3.005e-01 -0.947 0.343700
## priceStartUSD 5.758e-03 8.016e-04 7.183 7.65e-13 ***
## wheelchairAccessibleEntranceTRUE -9.989e-02 1.691e-02 -5.909 3.64e-09 ***
## goodForChildrenTRUE 1.375e-01 1.791e-02 7.677 1.89e-14 ***
## servesDinnerTRUE -1.309e-01 2.476e-02 -5.286 1.30e-07 ***
## freeStreetParkingTRUE 1.058e-01 1.354e-02 7.808 6.84e-15 ***
## curbsidePickupTRUE 6.189e-02 1.219e-02 5.078 3.93e-07 ***
## freeParkingLotTRUE -7.734e-02 1.375e-02 -5.625 1.94e-08 ***
## userRatingCount 3.000e-05 5.832e-06 5.145 2.77e-07 ***
## acceptsCashOnlyTRUE -1.610e-01 5.200e-02 -3.096 0.001969 **
## deliveryTRUE 8.310e-02 2.377e-02 3.495 0.000477 ***
## acceptsNfcTRUE -5.816e-02 1.918e-02 -3.032 0.002443 **
## acceptsCreditCardsTRUE 8.808e-02 2.763e-02 3.188 0.001441 **
## servesWineTRUE 8.258e-02 2.368e-02 3.487 0.000492 ***
## servesCocktailsTRUE -6.878e-02 2.419e-02 -2.844 0.004476 **
## businessStatusCLOSED_TEMPORARILY 2.616e-01 1.343e-01 1.948 0.051425 .
## businessStatusOPERATIONAL 1.873e-01 1.300e-01 1.441 0.149657
## acceptsDebitCardsTRUE -7.897e-02 3.879e-02 -2.036 0.041791 *
## wheelchairAccessibleSeatingTRUE 2.594e-02 1.592e-02 1.630 0.103186
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.447 on 5827 degrees of freedom
## Multiple R-squared: 0.2077, Adjusted R-squared: 0.1978
## F-statistic: 20.93 on 73 and 5827 DF, p-value: < 2.2e-16
Conclusion
- Residual Standard Error (RSE) is
0.447on5827degrees of freedom.- Multiple R-squared (\(R^2\)) is
0.2077The model explains20.77%of the variance inrating.- Adjusted R-squared (\(R^2\)) is
0.1978. This is a better measure for comparison, as it penalizes models for including irrelevant variables. If the Adjusted \(R^2\) is much lower than \(R^2\), it confirms that many of your predictors (likely the numerousprimaryTypedummy variables) are not useful and the model is slightly overfit.
step_forward model to predict
rating in RestaurantTest and calculate
MAE and MSE.# Predicting rating
ypred_forward = predict(object=step_forward, newdata = RestaurantTest)
# mean absolute error of predicted rating, and Actual rating
MAE(y_pred = ypred_forward, y_true = RestaurantTest$rating)
## [1] 0.3191636
# mean square error of predicted rating, and Actual rating
MSE(y_pred = ypred_forward, y_true = RestaurantTest$rating)
## [1] 0.1854483